{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Theano 在 Windows 上的配置 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<font color=red>注意：不建议在 `windows` 进行 `theano` 的配置。</font>\n",
    "\n",
    "<font color=red>务必确认你的显卡支持 `CUDA`。</font>\n",
    "\n",
    "我个人的电脑搭载的是 `Windows 10 x64` 系统，显卡是 `Nvidia GeForce GTX 850M`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装 theano"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先是用 `anaconda` 安装 `theano`：\n",
    "\n",
    "    conda install mingw libpython\n",
    "    pip install theano"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 安装 VS 和 CUDA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "按顺序安装这两个软件：\n",
    "- 安装 Visual Studio 2010/2012/2013\n",
    "- 安装 对应的 x64 或 x86 CUDA\n",
    "\n",
    "Cuda 的版本与电脑的显卡兼容。\n",
    "\n",
    "我安装的是 Visual Studio 2012 和 CUDA v7.0v。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置环境变量"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`CUDA` 会自动帮你添加一个 `CUDA_PATH` 环境变量（环境变量在 控制面板->系统与安全->系统->高级系统设置 中），表示你的 `CUDA` 安装位置，我的电脑上为：\n",
    "\n",
    "- `CUDA_PATH`\n",
    "    - `C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v7.0`\n",
    "\n",
    "我们配置两个相关变量：\n",
    "\n",
    "- `CUDA_BIN_PATH`\n",
    "    - `%CUDA_PATH%\\bin`\n",
    "- `CUDA_LIB_PATH`\n",
    "    - `%CUDA_PATH%\\lib\\Win32`\n",
    "\n",
    "接下来在 `Path` 环境变量的后面加上：\n",
    "\n",
    "- `Minicoda` 中关于 `mingw` 的项：\n",
    "    - `C:\\Miniconda\\MinGW\\bin;`\n",
    "    - `C:\\Miniconda\\MinGW\\x86_64-w64-mingw32\\lib;`\n",
    "\n",
    "- `VS` 中的 `cl` 编译命令： \n",
    "    - `C:\\Program Files (x86)\\Microsoft Visual Studio 11.0\\VC\\bin;`\n",
    "    - `C:\\Program Files (x86)\\Microsoft Visual Studio 11.0\\Common7\\IDE;`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "生成测试文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Writing test_theano.py\n"
     ]
    }
   ],
   "source": [
    "%%file test_theano.py\n",
    "from theano import config\n",
    "print 'using device:', config.device"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以通过临时设置环境变量 `THEANO_FLAGS` 来改变 `theano` 的运行模式，在 linux 下，临时环境变量直接用：\n",
    "\n",
    "    THEANO_FLAGS=xxx \n",
    "    \n",
    "就可以完成，设置完成之后，该环境变量只在当前的命令窗口有效，你可以这样运行你的代码：\n",
    "\n",
    "    THEANO_FLAGS=xxx python <your script>.py\n",
    "    \n",
    "在 `Windows` 下，需要使用 `set` 命令来临时设置环境变量，所以运行方式为：\n",
    "    \n",
    "    set THEANO_FLAGS=xxx && python <your script>.py "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "using device: cpu\r\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "\n",
    "if sys.platform == 'win32':\n",
    "    !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py\n",
    "else:\n",
    "    !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using gpu device 0: Tesla C2075 (CNMeM is disabled)\n",
      "using device: gpu\n"
     ]
    }
   ],
   "source": [
    "if sys.platform == 'win32':\n",
    "    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py\n",
    "else:\n",
    "    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "测试 `CPU` 和 `GPU` 的差异："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_theano.py\n"
     ]
    }
   ],
   "source": [
    "%%file test_theano.py\n",
    "\n",
    "from theano import function, config, shared, sandbox\n",
    "import theano.tensor as T\n",
    "import numpy\n",
    "import time\n",
    "\n",
    "vlen = 10 * 30 * 768  # 10 x #cores x # threads per core\n",
    "iters = 1000\n",
    "\n",
    "rng = numpy.random.RandomState(22)\n",
    "x = shared(numpy.asarray(rng.rand(vlen), config.floatX))\n",
    "f = function([], T.exp(x))\n",
    "\n",
    "t0 = time.time()\n",
    "for i in xrange(iters):\n",
    "    r = f()\n",
    "t1 = time.time()\n",
    "print(\"Looping %d times took %f seconds\" % (iters, t1 - t0))\n",
    "print(\"Result is %s\" % (r,))\n",
    "if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):\n",
    "    print('Used the cpu')\n",
    "else:\n",
    "    print('Used the gpu')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Looping 1000 times took 3.498123 seconds\r\n",
      "Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761\r\n",
      "  1.62323284]\r\n",
      "Used the cpu\r\n"
     ]
    }
   ],
   "source": [
    "if sys.platform == 'win32':\n",
    "    !set THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 && python test_theano.py\n",
    "else:\n",
    "    !THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python test_theano.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using gpu device 0: Tesla C2075 (CNMeM is disabled)\n",
      "Looping 1000 times took 0.847006 seconds\n",
      "Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761\n",
      "  1.62323296]\n",
      "Used the gpu\n"
     ]
    }
   ],
   "source": [
    "if sys.platform == 'win32':\n",
    "    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py\n",
    "else:\n",
    "    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到 `GPU` 明显要比 `CPU` 快。\n",
    "\n",
    "使用 `GPU` 模式的 `T.exp(x)` 可以获得更快的加速效果："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting test_theano.py\n"
     ]
    }
   ],
   "source": [
    "%%file test_theano.py\n",
    "\n",
    "from theano import function, config, shared, sandbox\n",
    "import theano.sandbox.cuda.basic_ops\n",
    "import theano.tensor as T\n",
    "import numpy\n",
    "import time\n",
    "\n",
    "vlen = 10 * 30 * 768  # 10 x #cores x # threads per core\n",
    "iters = 1000\n",
    "\n",
    "rng = numpy.random.RandomState(22)\n",
    "x = shared(numpy.asarray(rng.rand(vlen), 'float32'))\n",
    "f = function([], sandbox.cuda.basic_ops.gpu_from_host(T.exp(x)))\n",
    "\n",
    "t0 = time.time()\n",
    "for i in xrange(iters):\n",
    "    r = f()\n",
    "t1 = time.time()\n",
    "print(\"Looping %d times took %f seconds\" % (iters, t1 - t0))\n",
    "print(\"Result is %s\" % (r,))\n",
    "print(\"Numpy result is %s\" % (numpy.asarray(r),))\n",
    "if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):\n",
    "    print('Used the cpu')\n",
    "else:\n",
    "    print('Used the gpu')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using gpu device 0: Tesla C2075 (CNMeM is disabled)\n",
      "Looping 1000 times took 0.318359 seconds\n",
      "Result is <CudaNdarray object at 0x7f7bb701fb70>\n",
      "Numpy result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761\n",
      "  1.62323296]\n",
      "Used the gpu\n"
     ]
    }
   ],
   "source": [
    "if sys.platform == 'win32':\n",
    "    !set THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 && python test_theano.py\n",
    "else:\n",
    "    !THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python test_theano.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "!rm test_theano.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置 .theanorc.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们可以在个人文件夹下配置 .theanorc.txt 文件来省去每次都使用环境变量设置的麻烦：\n",
    "\n",
    "例如我现在的 .theanorc.txt 配置为：\n",
    "```\n",
    "[global]\n",
    "device = gpu\n",
    "floatX = float32\n",
    "\n",
    "[nvcc]\n",
    "fastmath = True\n",
    "flags = -LC:\\Miniconda\\libs\n",
    "compiler_bindir=C:\\Program Files (x86)\\Microsoft Visual Studio 11.0\\VC\\bin\n",
    "\n",
    "[gcc]\n",
    "cxxflags = -LC:\\Miniconda\\MinGW\n",
    "```\n",
    "\n",
    "具体这些配置有什么作用之后可以查看官网上的教程。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
